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I ^ Summary 

.jg The seminal paper of Prentice & Pyke ( 19791) established that the maximum likelihood esti- 
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mator for the odds-ratio of a case-control study is that of a logistic regression. In other words, the 

' j_j 2Q incorrect prospective model is equivalent to the correct retrospective model. We identify neces- 

^ 2 J sary and sufficient conditions for the corresponding result in a Bayesian analysis, that is, that the 

^ ^, 00 posterior distribution for the odds-ratio be the same under both the prospective and retrospective 

22 Ukelihoods. These conditions can be used to derive a parametric family of prior laws that can be 

^ 24 used for such an analysis. 

00 26 
^27 

^ 29 In order to estimate the risk factors for a disease (or any other binary outcome), there are two 

^0 basic approaches: a prospective or cohort study, in which subjects are selected from the popula- 

T— I 32 tion, possibly based on their risk factors, and observed to determine if the disease arises; and a 

j>! 32 case-control or retrospective study, in which random samples are taken from both the population 

k> 33 "^ith the disease (cases), and the population without (controls), and the relative frequencies of 

34 the risk factors in the two samples is then recorded. 

C3 35 Let Y be the outcome variable taking values in {0, 1}, corresponding to the absence or pres- 

36 ence of disease, respectively. Let X be the vector of covariates (risk factors) taking values in 

37 A" C M}'. In a prospective study we are sampling from the conditional distribution of Y given X. 

38 Under a proportional odds assumption, the model is that of a logistic regression, 
39 

40 ey(a+/3^a;) 

41 p{y\x,a,P)= aGR,/3GM'=. (1) 
42 

43 On the other hand, a case-control study will result in observations from the conditional distri- 

44 bution of X given Y. In this case, specifying a probabilistic model becomes much more difficult, 

45 particularly if X is infinite. 

46 Despite these difficulties, case-control studies are often desirable, or in some cases unavoid- 

47 able, particularly where the disease is relatively rare or the time until diagnosis is long, as the 

48 costs of obtaining a sufficient sample size for a prospective study are likely to be prohibitive. 
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Prentice & Pvke (Il979h showed that the maximum likelihood estimator of the log-odds ratio 



parameter (3 and its asymptotic covariance, could simply be found by a logistic regression. In 
other words, we can use the prospective model to analyse data gathered retrospectively. This 
particular result has been widely applied in epidemiology and other areas. 

In this paper, we identify the analogous result for the Bayesian case: that is, the conditions 
under which the posterior distribution for /3 can be computed using the prospective likelihood 
instead of the retrospective. 

The simplest m odel of a single binar y covariate, where X = |0, has been wel l ex- 
plored in literature : Izelen & Parkej (|l986h . iNurminen & MutanenI (|l987h . iMarshalll (|l988b and 
Ashbv etal.l(ll993 h have all characterized such an analysis, which consists of computing the pos- 
terior distribution of the log odds ratio of a 2 x 2 contingency tabl e under a Dirichlet prior . In the 
case where the covariates are categorical, that is where X is finite, ISeaman & RichardsonI (|2004r) 
identified a class of impro per priors that satisfy the desired properties. This class was further 
expanded by lStaicul (120101) . 

We show that the basis of this prospe ctive-retrospectiv e symm etry is due to "independence" 
of the parameters: the original result of IPrentice & Pykg ( 1979h can be explained through the 
variation independence in the parameter space, and that the corresponding Bayesian result will 
occur when the prior law exhibits analogous pr obabilistic independence. Furthermore, we arrive 
at the same class of prior laws as IStaicu (l2010h via a different route, and demonstrate how they 
can be extended to stratified designs. 

However this is not the only approach for Bayesian analysis of case-control data. With the 
advent of computational tools such as MCMC, the retrospective likelihood need not present 
such an obstacle . Inde ed this path h as been well followed in the literature, as revie v yed in 
Mukheriee elaT (2005'). For example, iMuller & Roedej (Il997h . lSeam^& Richardson' ('2001') 
and ,Gustafson et al. (.200 2) have pursued this approach. In particular. l Oustafson et al. (,2002) 
note that in general the prospective posterior can serve as a useful approximation to the retro- 
spective posterior, and use this as the basis of an importance sampling scheme. 



I. Notation AND DEFINITIONS 

Throughout the paper, (X, Y) will denote a single joint observation from the specified model, 
and a sequence of n such observations; p will be denote density of the model (with 

respect to the appropriate measure), with variables indicating the context. 

We recall the notation and definitions from lOawid & LauritzenI (Il993h . If denotes a joint 
probability distribution for (X, Y), then 9x and By will denote the corresponding marginal dis- 
tributions of X and Y respectively. Furthermore, 6y\x=x will be the conditional distribution 
of Y given X = x, and 9y\x = {Gy\x=x '■ ^ G -^l will be the family of all such conditional 
distributions, and likewise for 9x\y- 

A model is a set G of such joint probability distributions 6. For any two functions 0, r on 0, we 
define the conditional range of (p given r = t to be {0(^) : 9 ^ & and t{9) = t}. Furthermore, 
(p is said to be variation independent of r, written (/>:[: r, if this is constant for all values of 
t; in other words, if {(p, r) takes values in a pr oduct space. In a sirn ilar manner, we can define 
conditional variational independence (see Dawid & LauritzenI 1993 ). 

A model is called strong meta Markov if 



9x t 



Y\X 



and 



^X\Y- 



(2) 



We define a law £ to be a probability distribution over a model. We say that a law is strong 
hyper Markov if we replace the variation independence of ^ with probabilistic independence 
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(denoted by _I_L) under £: 

Ox^Oyix and 9yAL9x\y [£]■ 

As variation independence is a necessary condition for probabilistic independence, a necessary 
(but not sufficient) condition for a law to be strong hyper Markov is that its support be a strong 
meta Markov model. 

We use the relation ~ ^ to denote the existence of a bijective function between and ip. 
For example, we have 9 ~ {9x,dY\x) — {^y,Ox\y)- 

Lemma 1 . For the above logistic model, 

6ly|x =^ (a,/3) and 9x\y ^ {Ox\y=q, 1^)- 
Proof The first equivalence follow from ([Til, and the second from Bayes theorem: 

<^9x\Y=i , ^ _ ^y|x=x(l) 6'y(0) 

The usual definition of independence does not apply in the case where £ is improper, so 
instead we define i;^ _LL r to mean that the joint density factorizes into a functio n of 4) and a 



function of r. Owing to the problems of marginalising improper distributions (see iDawid et al 
[1973,), this only makes sense if ^ ~ {(j), r). 



2. Maximum likelihood estimators 
Prentice & Pvkel(ll979h showed that the maximum likelihood odds-ratio estimators obtained 



from a case-control study have the same values and asymptotic properties as those arising from 
a prospective study; in particular, they can be computed from a prospective logistic regression. 
This can be demonstrated using the strong meta Markov property. 

Lemma 2. Let Q ~ Qx x 0y|x> where Qx the family of all probability distributions over 
X, and Qy\x the family of all conditional distributions with densities of the form in Then 
the corresponding family of joint distributions Q is strong meta Markov, that is. 

Ox X and 9y X (^x|y=o,/3). 

Proof. These properties are essentially a reformulation of iMiiller & Roeder (1997, Lemmas 
1 and 2). By definition 9x X Gy\x- It remains to show variation independence in the opposite 
direction. 

For any 9x and 9y\x^ the joint distribution 9 has a density of the form 

gy(a+/3Tx) 

p{x,y\9)= _^^^^^^-,^ p{x\9x). 

Therefore the marginal distribution 9y is Bernoulli, with parameter 7 taking values on the inter- 
val (0, 1), where 

7 = p(y = 1 I ^y) = Y^^^^Pi.^ I ^x) dx (3) 
and the conditional distribution of X given Y has density of the form 

pix y I 9) e^^""'°^ T^+/5^^0 

I ^'^^1^) = 7.(1-7)^-^ = (l-7)(l + e^+/^^-) ^^" ' 
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145 For any 7' G (0, 1), define 6' ~ {9'x,0y\x)^ where 

146 
147 



'^.i^ ~ (a',/3) € ey|x with a' = a-log-^ + log-^, (5) 
I I 1 — 7 1 — 7' 



148 . 1^ ^ ^ ^ 1-7 1-7 

149 
150 



and d'x has density 



153 

154 By the definition of 7 in Q, it can be shown that this integrates to 1, hence O'-^ G ©x- Further- 

155 more, by matching terms in (01), then 6x\y = (^'x\y- Since O'y — 7' can be chosen arbitrarily, it 

156 follows that Oy % 9y\x- □ 



We can us e the fact that variation ind ependence satisfies the same properties as conditional inde- 



pendence (iDawid & Laiiritzenl . ll993l) 



157 
158 
159 

160 Corollary 1. Under the joint logistic model of Lemma^ 

1^2 9x t a \ 13 and 9y t 9x\y I 

163 

164 The logistic model has other variation independence properties: 

165 ^ 

Corollary 2. Under the joint logistic model ofLemma\2] 

16V {ex,9y)tf3- 
168 

169 



Proof. We have 9x t (a, /?), and for any 9y, we can choose a' as in ([S]). □ 

IVO Theorem 1. Suppose we have a joint model as in Lemma\2\ Then the profile likelihood func- 

IV 1 tionfor the odds ratio j3 is the same for both the retrospective model ©x|y and the prospective 

model @y\x, up to proportionality. 



p{x,y I 9) =p{x\9x)p{y I x,a,P) = p{y \ 9y)p{x \ y,9x\Y=o, f3)- 



111 
173 

174 Proof. This proof follows a similar argument as lDawid & Lauritzenl (|1993L Lemma 4. 10). The 

175 joint density for the model 9 can be written as 

176 
177 

178 Therefore the profile likelihood for the joint model can be written in terms of the prospective 

179 model: 

= maxp(x I 9x)v{v I ^, 9y\x). (6) 

lot afix 

182 
183 
184 

185 oc maxp(y | x, a, /3) = LP'°(/3), 

186 Pa 

187 where Lp™ denotes the profile likelihood of the prospective model. The same argument applies 

188 to the retrospective profile likelihood -Lp'(/3): 
189 

190 ^r'(/5) « ™ p{x I y, 0x|y=o, /5) = ^p'(/3). 

191 ^ 

192 From this we obtain the result of iPrentice & Pvke d 19791) : 



By the conditional variation independence a and 9x given /3 of Corollary [H the factors of ^ 
can be profiled separately, so that 
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193 Corollary 3. For data observed in a case-control study, the maximum likelihood estimator 

194 of the log odds parameter /3 and its asymptotic covariance can be computed as if the data were 

195 observed prospectively, that is, using logistic regression. 
196 

^^rj Proof. The maximum li kelihood estima tor is a function of the profile likelihood, as is its 

^„„ asymptotic covariance (see |PatefieldL[l985h . □ 



214 
215 



199 The same argument can also be applied to the value, but not the covariance, of any penalized 

200 logistic regression estimator of the form 
201 
202 
203 

204 Examples of such estimators include ridge regression, where 4>{li) oc ||/3||2, and lasso, where 

205 Of Such methods have proven successful in genome-wide asso ciation studies , which 



argmax |logp(y | x, a, /3) + </>(/3)}. 



206 involve case-co ntrol data with extremely high-dimensional covariates (iPark & Hastiel 12008 

207 IWuetal.Ll2009h . 



208 
209 

210 3. Bayesian analysis of case-control studies 

211 We now investigate how these results correspond to a Bayesian analysis. We use vr to denote 

212 the density of the prior law, and vrP" and vr'''^' to denote the densities of the posterior laws i^P™ 

213 and £'^^^ under prospective and retrospective likelihoods, respectively: 

7rP"(a,/3 I cx7r(a,/3)p(y(") | x("),q,/3) 

216 vr-H^x|y=o,/3 I oc 7r(0^|y=o,/3)p(x(") | 0x|y=o, /3) 
217 

2^§ Furthermore, we will use p to denote the density of the marginal model, where parameters 

2^9 have been integrated out (using the prior law), for example 

220 
221 
222 

223 In other words, when interpreted as a function of /3, p{y^^^ |x*^"^ , /3) is the marginal likelihood for 

224 f3. 

225 We now present the key result of this section. 

226 ~ 

Theorem 2. Let £{9) be a prior law for the joint parameters of the logistic model. Then the 

posterior marginal law for /3 is the same under both prospective and retrospective likelihood for 
all sample sizes n, and all possible observations (x^"^ , y^"^ ), if and only if 

230 ^AL9x and ^ ALOy [£]■ (7) 

231 

232 Proof. First, the marginal posterior densities for /3 can be written as 

233 
234 
235 
236 
237 
238 
239 

240 I y("\/3) = | x("),/3) A;(x("), y^")). (8) 



P"(/3 I x("),2/(")) oc | x("),/3) 

7r-(/3|x('^),y("))oc^(/3)p(xW 



vr- 

,ret 



where p denotes the marginal model. Hence the marginal posteriors are equal if and only if the 
retrospective and prospective marginal likelihoods for /? are proportional, for vr(/3) > 0. In other 
words, whenever there exists a function k such that 
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241 These models are also related through the joint model 

242 
243 

244 therefore ([D is equivalent to 

245 

I /3) = I /3) A:(x("),y(")). (9) 



Since X^") _LL ,3 | we can write the marginal model for X^^^ \ j3 as 



246 
247 
248 
249 

250 P{x^^^ I /3) 

251 

252 Therefore, if Ox -LL /3, then p{x^^^ \ (3) must be constant in (3, and similarly for p{x^^^ \ j3) if 

6iy _U_ /3. Hence © implies ®. 

To show the converse, suppose that ^ holds for all n and values of (x^") , 

is a density, it must be proportional to A;(x("\ t/g"^), for any fixed y^^\ and so X*^") is independent 

257 of/3. 

Now p{x^'^' I /3) is the density of a mixture of independent and identically distribu ted vari- 
259 ables, and the mixing measure of such an infinite sequence is uniquely determined ( Aldousl 

250 1985 1 Lemma 2.15). It follows that it{9x \ /3) must be independent of /3, and hence 9x -LL /3. 

The same argument holds for ^y. □ 

Several authors have identified similar results. Notably, Milller & Roeder (Il997 l) appear to 



263 have almost identified the conditions in ([7]l, but then incorrectly claim that the "argument about 

264 the retrospective likelihood only carries over to posterior inference on /3 if a and /3 are indepen- 

265 dent and 9x is not otherwise constrained". This misconception appears to be due to the fact that 

266 although there is a one-to-one mapping between a and Oy, this mapping is itself dependent on 

267 through (O. Unfortunately, this means that the Dirichlet process mixture they propose does 

268 not satisfy the required properties. 
269 

270 Example 1. Any law £{9) with the property 

272 [£]. 

TTi We can construct such a law from two arbitrary laws £ra{d) and £o{9), on taking £ to be the 

274 product law of their projections £m{&x,&Y) and £o{P). By Corollary|2l there will exist a 9 with 

275 these marginals, and since 
276 
277 



278 such a law would be uniquely determined. 

279 Unfortunately, such a law would probably not be all that useful, as it would still require com- 

280 puting the integral 
281 

282 _ f ^a(l3fixfiY)+P'^x 

283 I = yexxe. l + e"(/^-^x.M+/^"- d£^{9x.9yl 
284 

2§5 which may not be any easier than the retrospective likelihood. 

286 In order to avoid the need to compute such integrals, we can require a and 9x to be indepen- 

287 dent, such as in strong hyper Markov laws. 
288 
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Corollary 4. If £{6) is strong hyper Markov, that is, 

(a, /3) AL Ox and (6*x|y=o, /3) ^ 



[£], 



then the posterior law for (3 is the same under both the prospective and the retrospective likeli- 
hood. 

For the case that X is finite, conditions equivalent to the strong hyper Markov property were 
shown to be sufficient in a 2007 University of Bristol technical report by A.-M. Staicu. 

The problem of model comparison for case-control studies has received comparatively Uttle 
attention in the literature, particularly for Bayesian analyses. However we can derive a result 
similar- to that of Theorem |2] 

Theorem 3. If £i{6) and £2(0) have the same marginal laws for 9x and Oy, then the Bayes 
factor between the prospective models is equal to the Bayes factor between the retrospective 
models. 

Proof. Let M take values 1 and 2 each with probability 1/2, and, given M = j, let the con- 
ditional law of 6 be £j. In the resulting joint law £* for {6, M), when the conditions of the 
theorem hold we shall have 

MALOx and MALOy [£*]. 

By the same argument as for Theorem|2l the posterior probabilities, and hence the Bayes factors, 
must be equal. □ 



4. Strong hyper Markov laws for logistic regression 

Given the results of Corollary |4l we now investigate various strong hyper Markov laws for use 
as prior laws in case-control studies. 

4-1. A single binary covariate 
In the case of a single binary covariate, X = {0, 1}, the logistic model is just a reparametriza- 
tion of the 2x2 contingency table. 

Example 2. The simplest strong hyper Markov law for this model is the Dirichlet law £{6) = 
T^{cLxy), with density 



<0) 



1 



B{9oo, ^01) ^lOi ^11) 



'^oo "^01 '^lo "11 ' 



w here O^.y = viX = x,Y = y \ 0). This law has been well explored in the literature, in particular 



by lAltham who investigate d log odds ra t io parameter: and was l ater u s ed in the contex t 

of c ase-control st udies by , Zelen & Parkeil (ll986h . lNurminen & MutanenI (ll987h . lMars"haUl (Il988h 
and lAshbvetaD(ll993h . 

^^^"^^''^ " -^-^^1+^, we find = ^(ao+, ai+), and 



By reparametrizing d^y = j^^^^uq 

7r(a,/3) 



oaaoi „(a+/3)ai 



(1 -\- e")'^o+(l -f e«+/')«i+ 



However the family of str ong hyper Markov laws on 2 x 2 tables is more general than this. 
Geiger & HeckermanI (119971 equation 10) note that a law with full support is strong hyper 
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Markov, which they term "global parameter independence", if and only if it has a density of 
the form 

vrW oc h (^) e^S'''Gor'(^io''Xr\ (11) 
for a positive Lebesgue integrable function h. The corresponding density of £{a, (3) is 

gaaoi g(a+/3)aii 

where g{f3) = h{e^). 

4-2. Finite covariate space 

A more general case is where X is larger but still finite, for example a model with multiple 
categorical covariates. Prior specification is now not so simple: the proportional odds constraint 
implies that the logistic model will be confined to a submanifold of the probability simplex of 
the full lA'l X 2 contingency table. 

We solve this problem by adapting the conditioning procedure of Dawid & LauritzenI ( 2001 . 



section 4) for constructing laws on nested models, by firstly choosing an arbitrary strong hyper 
Markov law £'{9) for the saturated model on X x {0, 1}, and then constructing the law £ from 
£' co nditional on satisfying th e proportional odds requirement. 



As lDawid & LauritzenI (120011) emphasized, the Borel-Kolmogorov paradox shows that there is 
no unique way to condition on a submodel. Furthermore, in selecting the method of conditioning, 
we need to ensure that it preserves the strong hyper Markov property. 

We assume that there exists xi, . . . , x^+i G X such that (1, xi), (1, X2), • • • , (1, x^+i) are 
linearly independent, since otherwise /3 is not identifiable. We can reparametrize the saturated 
model as 

P{y I x,a,^,r]) 



where t?^, = if x = xi, . . .,Xk+i. Then 9y\x - (a,/3,??) and Oxiy - (^x|y=Oi /3, and 
hence if £' is strong hyper Markov: 

{aJ,fi)AL9x and (e~x|y=o, A ??) ^ [£']■ 
Note that the logistic model is the manifold defined by = 0. Furthermore, 

{aJ)ALex\f] and (^x|y=o, /3) ^ | r/ [£']■ 

Hence £{9) defined as £'{6 | f/ = 0) is a strong hyper Markov law for the logistic model. 

To begin this construction we require a strong hyper Markov law for the saturated model. One 
possibility is by extending ([TT]) to larger 2-way tables. 

Theorem 4. If a law £{9)for a 2-way contingency table X x Y on X x y has a density of 
the form: 




xy^x'y' 



xy X y / 




(12) 



for some x*,y* £ X,y and a positive Lebesgue intregrable function h, then it is strong hyper 
Markov. 
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Proof. Define 6-^-y = p{Y = y \ 6) and O^^^y = p{X = x \ Y = y,9). Then the Jacobian de- 
terminant of the transformation 9xy ^ {9+y, 6':r|j/) is 



xy 



which gives the joint density for 



^x\y)'- 



n 



\X\-1 

+y ' 




X\y^X'\yj x^^,^y^y. 



x,y 



x\y 



This factorizes into a term involving only 9^y terms, and another involving only 6*3,1^^ terms, and 
therefore 6y -LL 6x\y- By symmetry, the same argument holds in the other direction. □ 



Theorem m can be viewed as the Bayesian counterpart to the theorem of lAlthamI (|l970h . that the 
cross-ratio of a 2-way contingency table is variation independent of the marginal distributions. 

It is unclear if the converse is true, i.e. if (fT2l) characterizes all possible strong hyper Markov 
laws with full support. The corresponding result for (fTTT) relies on results from functional equa- 
tions, and these arguments can not be easily extended directly to higher dimensions. 

Applying the conditioning approach to this law leads to the following law for the logistic 
model. 

Example 3. We know from Theorem |4] that densities of the form 



xWx*0 




K TT qo.xO — 1 naxi-l 

xot^xn y J 

for some arbitrary x* G X, are strong hyper Markov for the full | | x 2 contingency table model. 
The Jacobian determinant of the above transformation is 



d9 



Y\X 



d{a,/3,ri) 



oc 



ii (1 _|_ Qa+l3^x+rjx^2 ' 



and hence the density for £'{a, j3,fi) is of the form 



x^x* 



^ xex ^ 



(^\ _|_ qOi+P^ x+rix^ax+ 



By conditioning on r]x = for all x G A", we obtain the density of £{a,l3)\ 

ii f 1 I f,a+p-^x\ax+ ' 
x^X -r J 

whereg(/3) = /i{(e/^^(---*))^^^.}. 

The Jacobian of the transformation in terms of the retrospective parameters is 



(13) 



d{a,P,9x) 



d(ex|o,/3,7) 



(1-7) 



\x\-i 



,a+f} x\ 



x^X 
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and so the density of £{6x\o, P) 

^ ''x\0 



There are other ways to perform such a conditioning operation, such as using the odds ratio, 
but r] has the desi rable property of being invariant to the choice of x* and xi, . . . , Xk+i- 



t r] nas the desi rapie property ot being invariant to tne cnoice ot x and xi 
The prior from Staicul ( 2010l . Example 2) is obtained on rewriting ( [13] ) as 



xex 



where g*(/3) = exp(y~!^^y g-^i/^yx). On taking the limit as a+i ^ we obtain the im- 



proper prior of|s eaman & RichardsonI (12004.) and Steicu (2010,, Example 1) 



However, we argue that the form of ([T3] | is more easily interpreted: it can be thought of as the 
product of an improper prior with density g{/3) dfi da and a logistic likelihood function, where 
the a^y represent pseudo-counts. This has the further benefit of being able easily to adapt exist- 
ing computational methods: for example, a Laplace approximation can be found using regular 
logistic regression software. 

Although X appears in the density of £{di^ j3), we disagree with'Staicu ( 2010h that this consti- 



tutes a covariate dependent prior, such as the (7-priors of Zellner (198d): it is dependent on the a 
priori expected frequency of the covariates, and not the observed frequency of the covariates in 
the data. 

This law can itself be constructed as the posterior of a beta prior law. 



Proposition 1. For each x e X, let 



^a+l3'xi 



1 _)_ ga+13'^Xi ■ 

For some xi, . . . , Xk+i € X such that (1, xi), (1, X2), . . . , (1, x^+i) are linearly independent, 
let £'{6) be the product law of the marginal laws 

£'i.^xi) = B{axiO,axii)- 

For all other x / xi, . . . , Xk+i, let 

£'{Zx I 0) = Binomial(a2;+, Ta;). 
Then the posterior law £'{6 \ = a^i) will have density of the form (I13I ). where g constant. 
Proof. The prior law £'{a, /?) will have density proportional to 

„{a+l3^ x)axi 



n 



(^\ _|_ qCI+P^ XYx + 

Likewise the likelihood of {Zx = CLxi)xT^xi,...,Xk+i will be proportional to 



n 



(1 + e"+/5^^)»-+ 

This is particularly useful for implementing such procedures in generic Bayesian MCMC 
packages such as WinBUGS, OpenBUGS and JAGS: note that these packages happily accept 
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non-integer values for binomial co unts. Furthermore, arbitrar y functions g can be included by 
use of the "zero Poisson" trick: see SpiegeUialter et al. ( 2003L "Specifying a new sampling dis- 
tribution"). 

Unfortunately, this method is somewhat impractical for large numbers of covariates. In partic- 
ular, we note that the size of X increases exponentially with its dimensionality k. Furthermore, 
as X increases, fi will tend to concentrate around 0. To compensate for this, the values of {axy) 
can be chosen closer to 0, but unfortunately, the above software packages tend not work well, if 
at all, for very small values. 



5. Stratified case-control studies 

A more complicated case is that of stratified or matched case-control studies, in which partici- 
pants are selected by both the outcome Y and an additional stratum variable S. Such a design can 
often estimate the odds-ratio of interest with much greater efficiency than an unstratified study. 

The model is similar to that above, but with an intercept parameter that varies by stratum, so 
that the prospective model is 



p{y I x,s,a,/3) 



Unfortunately, this additional complication makes estimation more difficult. As the number of 
strata will increase with the sample size n, the usual maximum likelihood estimator is no longer 
consistent. 

Instead, the standard classical approach seeks to maximize the conditional likelihood 



sGS 2^0 1 lie/. ^ 



where Is = {i ■ Si = s}, and the summation in the denominator is over the possible permutations 

of {yi)i€is- 

If there are a cases and b controls in each stratum, called a:b matching, the sum in the denomi- 
nator will have ("^'') terms. In order to keep this computationally tractable, most studies use 1:1 
or l:m matching. 

Ho wever for a B ayesian analysis the conditional likelihood does not have a direct interpre- 
tation. Ricd (2004, Theorem 1) showed there exists a law such that the marginal retrospective 



likelihood p{x \ y, s, 13) is proportional to the conditional likelihood. However such a law de- 
pends on the matching scheme: e.g. a 1:1 matched design will require a different law than a 1:2 
matched design. 

Instead, we extend Theorem |2] to find conditions under which we can use the prospective 
likelihood for any matching scheme. 

Theorem 5. Let £{Oxy\s) ^ prior law for the parameters of the stratified logistic model. 
Then the posterior marginal law for f3 is the same under both the prospective and the retrospec- 
tive likelihood, for all possible observations (x^^^ , y^^^ , s^"^), if and only if 

^ALOxis and (^ALOyis [£]■ 

Proof. The argument is essentially the same as that of Theorem |2l noting that 6x\s ^rid 0y\s 
are the joint distributions for the random vectors {X\S = s)ses and {Y\S = s)sg5, respec- 
tively. □ 



552 



9x\s^mses and 6y\s AL {Ps)ses [£*]■ 
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529 To construct such laws, we use a conditioning procedure similar to that in the previous section. 

530 First, for each stratum s, let £s{(9xY\s=s) be a law satisfying Theorem |2l where Oy\x,s=s — 

531 {as, f3s)- Then define £*{Oxy\s) to be the product law Y[s £s, and therefore 
532 

533 

534 

535 This can be reparametrized in terms of (rs)^^^*] ~ (/3s)s(=5, where /3 = /3s* for some stratum 

536 s*, and Tg = Ps — P for each s / s*. Finally, we condition on r<j = 0. Since 
537 

538 ex\3^P\irs)s^s' and eY\s ^ P \ {fs)s^s* [£*], 

539 

540 it follows that £{9xy\s) defined as £*{9xy\s I = 0) will satisfy the conditions of Theorem|5] 

541 

542 Example 4. If we let each £s{as,l3s) be of the form in Example [3l the density for the law 

543 £*{a, /3, f) will be of the form 

544 
545 
546 
547 

548 Conditioning on f = gives a density for £{a, /?) as 

549 

550 T-r g{as+P'x)a: 

551 11 (I I (,as+l3Tx)a., 



g(as + (/3+rs) x)axia 
•>g5 x&X V-l -r c 



xls 



(x,s)£XxS 



This is of the same form as the density ( fT3] ). where the strata are treated as an additional cat- 
egorical covariate in the model. Furthermore, the marginal laws £{as,l3) will also be of this 
555 form, and the stratum parameters {as)s£S will be conditionally independent given /3. Moreover, 

if the parameters are the same across strata (i.e. axys = CLxys')^ then these stratum parameters are 
exchangeable, which could be a reasonable assumption in many analyses. 

558 

559 We have not specified a model for the stratum variable S, as we have assumed all data are 

560 observed conditional on S. However, under the additional assumption 
561 

562 0XY\s^Os [£], 

563 

the data can be treated as if they were randomly sampled from the population, as would hold for 
a cross-sectional study. 

566 
567 

55§ 6. Discussion 

569 A natural question is how to extend the above laws to the case where X is infinite, for example 

570 where a covariate is continuous. One obvious choice would be to replace the Dirichlet law for 

571 £{0x) with a Dirichlet process. However the resulting density for £{6y\x) in dE) would involve 

572 an infinite product, making it difficult to apply the standard Dirichlet process machinery of taking 

573 projections onto finite partitions of X, and appealing to the Kolmogorov extension theorem. 

574 There is potential for these techniques to be successfully applied to other models. In partic- 

575 ular, the stratified case-control model is closely related to the Rasch model, commonly used in 

576 psychometrics for measuring ability or attitudes of individuals based on tests and questionnaires. 
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